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ABSTRACT 

Single  carrier  cyclic  prefixed  (SCCP)  communications  are  a  close 
relative  of  multicarrier  communications.  Both  types  of  systems  are 
robust  to  multipath,  provided  that  the  channel  delay  spread  is  shorter 
than  the  guard  interval  between  transmitted  blocks.  If  this  condition 
is  not  met,  a  channel  shortening  equalizer  can  be  used  to  shorten  the 
channel  to  the  desired  length.  Previous  work  on  channel  shortening 
has  largely  been  in  the  context  of  digital  subscriber  lines,  a  wireline 
system  that  allows  bit  allocation,  thus  it  has  focused  on  maximizing 
the  bit  rate  for  a  given  bit  error  rate  (BER).  We  propose  and  evaluate 
a  channel  shortener  that  attempts  to  directly  minimize  the  BER  for 
an  SCCP  system.  The  problem  is  shown  to  be  analytically  distinct 
from  the  analogous  problem  in  multicarrier  systems. 

Index  Terms —  equalization 

1.  INTRODUCTION 

There  are  two  types  of  cyclic  prefixed  systems:  multicarrier  modu¬ 
lation  and  single-carrier  cyclic  prefixed  (SCCP)  modulation,  a.k.a. 
single-carrier  frequency  domain  equalization  (SC-FDE)  [1],  [2],  [3]. 
Examples  of  wireline  multicarrier  systems  include  power  line  com¬ 
munications  (HomePlug)  and  digital  subscriber  lines  (DSL).  Exam¬ 
ples  of  wireless  multicarrier  systems  include  wireless  local  area  net¬ 
works  (IEEE  802.1  la/g,  HIPERLAN/2,  MMAC),  wireless  metro¬ 
politan  area  networks  (IEEE  802.16),  digital  video/audio  broadcast¬ 
ing  in  Europe,  satellite  radio  (Sirius  and  XM  Radio),  and  multiband 
ultra  wideband  (IEEE  802.15.3a).  SCCP  modulation  has  not  been 
widely  implemented,  but  it  is  gaining  support  in  the  literature. 

Cyclic  prefixed  systems  are  robust  to  multipath,  provided  that 
the  delay  spread  of  the  channel  is  less  than  the  length  of  the  cyclic 
prefix  (CP)  between  transmitted  blocks.  If  the  channel  is  short,  then 
channel  equalization  can  be  done  tone-wise  in  the  frequency  domain 
by  a  bank  of  complex  scalars.  This  is  called  a  frequency-domain 
equalizer  (FEQ).  However,  if  the  channel  is  longer  than  the  CP,  addi¬ 
tional  equalization  is  required,  often  in  the  form  of  a  channel  shorten¬ 
ing  equalizer  (CSE)  [a.k.a.  a  time-domain  equalizer  (TEQ)],  which 
is  a  filter  at  the  receiver  front  end.  A  survey  of  CSE  design  for  DSL 
can  be  found  in  [4]. 
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Channel  shortening  was  first  applied  to  maximum  likelihood  se¬ 
quence  estimation  [5].  More  recently,  it  has  been  used  to  shorten  the 
long  wireline  impulse  responses  encountered  by  DSL  [6],  [7].  While 
early  designs  were  based  on  heuristic  cost  functions,  recent  designs 
have  adressed  maximizing  the  bit  rate  for  a  given  bit  error  rate  (BER) 
[8],  [9],  which  is  appropriate  in  wireline  multicarrier  systems  that  al¬ 
low  bit  allocation. 

Wireless  systems  generally  have  a  fixed  bit  allocation,  and  re¬ 
ceiver  performance  is  measured  in  terms  of  BER.  Moreover,  in  wire- 
line  systems,  once  bit  allocation  has  been  done,  the  CSE  can  be  used 
to  minimize  the  BER  of  that  bit  allocation.  Previously,  the  authors 
investigated  BER  minimizing  CSE  design  for  multicarrier  systems 
[10].  However,  the  additional  IFFT  at  the  end  of  SCCP  systems 
causes  coupling  that  drastically  changes  this  design  problem  com¬ 
pared  to  multicarrier  systems.  Hence,  the  main  goals  of  this  paper 
are  to  model  the  BER  for  SCCP  systems,  and  to  develop  and  assess 
a  CSE  that  minimizes  this  BER. 

2.  SYSTEM  MODEL 

We  assume  a  multiple-input  multiple-output  (MIMO)  channel  model 
with  L  transmit  antennas  and  P  receive  antennas.  Throughout,  (•)*, 
(•)^,  (q^,  and  £  {•}  denote  complex  conjugate,  matrix  transpose, 
conjugate  transpose,  and  statistical  expectation,  respectively. 

The  system  model  is  shown  in  Fig.  1 .  The  complex  finite-alphabet 
data  symbols  (usually  multi-level  QAM  data)  are  blocked  into  groups 
of  length  N.  The  such  block  for  transmitter  I  is  denoted  Sii{k). 
A  cyclic  prefix  is  inserted  by  copying  the  last  u  samples  of  the  block 
to  the  beginning  of  the  block,  and  then  the  S  =  N  +  u  samples 
are  transmitted  serially.  The  transmitted  data  sample  is  denoted 
xi{i).  Note  that  I  is  the  index  of  the  transmit  antenna,  p  is  the  index 
of  the  receive  antenna,  k  is  the  block  index,  n  is  the  tone  index,  i  is 
the  sample  index,  and  j  =  \/—l  is  the  unit  imaginary  number. 

The  redundancy  in  the  transmitted  signal  due  to  the  CP  can  be 
represented  by 

xi  [Sk  +  i)  =  xi  {Sk  -\-i  +  N) ,  (1) 

t  <  I  <  L,  l<j<!z,  —CO  <  k  <  CO. 

Let  hi,p  be  an  FIR  filter  of  length  Lh,  which  models  the  channel 
from  transmit  antenna  I  to  receive  antenna  p,  and  let  Wp  be  an  FIR 
filter  of  length  ,  which  is  the  CSE  to  be  designed  for  antenna  p. 
Let  Hi,p  be  the  tall  channel  convolution  matrix  for  h;,p,  which  is  an 
Lc  X  Toeplitz  matrix,  where  Lc  =  —  1. 
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Fig.  1.  Complex  baseband  SCCP  system  model. 

Define  the  transmitted  signal  vectors 

X;(i)  =  [xi{i),  ■  ■  ■  ,xi{i  -  Lc  +  l)]"^  ,  (2) 

x(*)  =  ,  (3) 

and  similarly  for  r]p{i),  ri{i),  yp{i),  y(*).  We  can  compactly  write 
the  CSE  input  vector  as 


y(i)  = 


Passing  the  signal  through  the  bank  of  CSEs  yields 

u{i)  =  w'^y(i).  (5) 

After  channel  shortening,  the  cyclic  prefix  is  discarded  and  a  discrete 
Eourier  transform  (DET),  implemented  by  the  fast  Eourier  transform 
(EET),  is  used  to  convert  the  data  to  the  frequency  domain.  We  use 
T  to  denote  the  unitary  DET  matrix,  with  element  (m,  n)  given  by 
The  DET  requires  an  estimate  of  the  transmis¬ 
sion  delay  A,  since  the  length  N  DET  input  vector  for  block  k  is 

u(/c)  =  [u{Sk  +  z/  +  1  +  A),  •  •  •  ,  u{S{k  +  1)  +  A)]”^  .  (6) 

The  delay  A  is  a  design  parameter  whose  choice  affects  the  values 
of  the  optimal  CSE  as  well  as  the  performance  that  can  be  attained. 
To  invert  the  channel  in  the  frequency  domain,  the  DET  is  computed, 
the  FEQ  d  is  applied,  and  an  inverse  DET  (IDFT)  is  computed, 

u(k)  =  T  u(fc),  (7) 

u(k)  =  d  ©  u(fc),  (8) 

xi(k)  =  J^^u(k),  (9) 

where  0  denotes  element-by-element  multiplication,  and  we  assume 
that  the  receiver  is  attempting  to  recover  the  data  from  transmitter 
/  =  1.  In  a  multiuser  scheme,  the  data  for  I  —  2,  ■  ■  ■  ,  L  can  be  ig¬ 
nored,  or  a  multi-user  detection  technique  can  be  used  to  mitigate  the 
interference.  In  a  single  user  scheme,  the  data  xi  may  be  the  same  on 
all  transmitters  or  an  Alamouti  transmit  diversity  space-time  code 
may  be  used  [11].  Interleaving  and  forward  error  correction  blocks 
can  be  included,  although  for  conciseness,  they  are  not  depicted  here. 

3.  BER  MODEL 

The  goal  of  this  section  is  to  model  the  BER  of  an  SCCP  system, 
which  we  will  attempt  to  optimize  in  Section  4. 

The  BER  will  be  averaged  over  the  N  elements  of  the  final  lEFT 
output,  xi(fc).  We  assume  that  the  total  residual  interference  and 
noise  on  each  output  sample  is  Gaussian,  and  that  M-level  QAM 


signalling  is  used  on  each  tone.  The  probability  of  error  on  the  PAM 
component  of  sample  m  is  given  by  [12,  pp.  225-226] 

P.^(™)  =  2(l-^)o(y'jjAsNR.),  <,0, 

hence  the  SER  of  sample  m  is 

P^{m)  =  2P^{m)-{P^{m))\  (11) 

where  SNRm  is  the  effective  signal-to-interference-and-noise  ratio 
on  sample  m  (which  we  will  refer  to  as  the  output  SNR);  and  Q{x) 
is  the  Q-function,  which  is  the  integral  of  a  unit  Gaussian  PDF  from 
X  to  infinity.  For  the  M  =  4  case,  (10)  is  the  BER  for  sample  m, 
and  it  reduces  to  Q  ( VSNR^) ,  which  we  use  here  for  simplicity  of 
notation.  Averaging  (10)  and  (11)  over  the  N  output  samples,  the 
BER  and  the  SER  for  the  output  of  an  SCCP  system  with  M  —  4 
are 


Either  can  be  used  as  the  objective  function,  although  we  focus  on 
the  BER.  At  this  point,  we  need  a  model  for  the  output  SNR. 

The  DET  output  can  be  written  as 

J^(Yfew)  =  (j^Yfc)w,  (14) 

where  Yfe  is  a  block  Toeplitz  matrix  of  size  N  x  (PLw),  where  each 
N  X  Lut  sub-block  contains  the  data  that  will  be  convolved  with  the 
CSE,  Wp,  and  successive  rows  are  vectors  y^(i)  for  successive 
values  of  i.  Then  Yk,N  is  an  A  x  {PL^)  matrix  as  well,  with 
the  row  denoted  by  yk,n-  Define  Qm,n  to  be  element  (m,n)  of 
the  (unitary)  IDFT  matrix,  with  Q  =  .  Passing  (14)  through  the 

FEQ  and  the  IDFT,  and  taking  output  sample  m  for  user  I  =  1  yields 

N 

Xl(A;)[m]  =  ^  Qm,nd„yk,n'W.  (15) 

n=l 

Define  the  correlation  terms 

(T^  =  f  ||x];(fc)[m]|^|  ,  (16) 

=  £  {^l{k)[m]yk,n}  ,  (17) 

=  S  |y"™yfe,n| ,  (18) 

which  have  dimensions  1  x  1,  1  x  PL^,  and  PLu,  x  PL^,  re¬ 
spectively.  For  simplicity,  we  assume  that  is  independent  of  m. 
The  output  SNR  of  sample  m  is  the  ratio  of  the  power  of  the  desired 
signal,  xi(fc)[m],  to  the  power  of  the  total  interference  and  noise, 

2  2 

SNRm  =  — ^  ^ (19) 

£  {|efc,m|^}  £  {|xi(A;)[m]  -  xi(fc)[m]|^} 

Using  (15),  the  denominator  expands  as 
£{\ek  ,m  I  n=EQ  Qm, 712  ^17-2 

ni  ,712 

¥>m,nW  -  XI  Q  m,n^n  (20) 
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Estimate  the  correlation  terms  in  (16),  (17),  and  (18)  from  time 
averages.  Choose  a  neighborhood  size  astep  and  a  unit  norm 
initial  guess  wtest,  then  loop  through: 


1.  Generate  Wstep  as  a  circularly  Gaussian  random  vector, 
i.i.d.  with  zero  mean  and  variance 


2. 

3. 

4. 


^  trial 


l|■"i,e»t+Watep|| 


Use  (12),  (22),  and  (23)  to  compute  BERtriai 


If  B E Rf-pifii  <C  BERif^g^^  then  ^^best  —  ^^triai  and 

BERbest  ~  BERtrial 


Stop  at  a  given  number  of  iterations  or  target  BER. 


Fig.  3.  Greedy  minimum  error  rate  (G-MER)  CSE  design  algorithm. 


4.  BER  MINIMIZATION  PROCEDURE 


Fig.  2.  Contours  of  the  BER  for  a  3-tap  CSE  under  a  unit  norm 
constraint.  The  CSE  is  parameterized  in  spherical  coordinates  by 
the  angles  9  and  (j>.  The  cost  function  is  symmetric  with  respect  to  a 
sign  change  in  the  CSE  (w  — >  — w),  i.e.  [6,  (j>)  ^  {6  +  Tr,Tr  —  <j)). 


The  unbiased  MMSE  FEQ  for  sample  m  is  found  by  setting  the  cor¬ 
relation  of  the  input  and  output  for  sample  m  equal  to  the  transmitted 
power  for  sample  m,  or  equivalently 

Tn,ndn  Yk,n'N  i  =  £■  {xt  (fc)  [m]xi  {k)  [m]}  , 

n=l  J 

N 

^  „W  =  (J^.  (21) 

n  =  l 

With  this  value  of  the  FEQ,  the  output  SNR  becomes 

2 

SNR^  = - - - - . 

(22) 


Eq.  (21)  can  be  rewritten  in  matrix  form  as 

(f'AmW  —  a^,  (23) 


where  row  Amin, :]  =  Q7n,n<Prn  n-  Collecting  these  N  equations 
into  a  vector  and  solving  for  d  yields 


-w^Af 

-1 

'1 

1 

(24) 


Since  the  BER  is  intractable  to  direct  minimization,  a  heuristic  ap¬ 
proach  is  needed.  The  approach  we  propose  is  a  greedy  search.  At 
each  iteration,  we  search  in  a  neighborhood  of  the  current  best  solu¬ 
tion,  and  if  the  new  CSE  has  a  lower  BER,  we  accept  it.  The  algo¬ 
rithm  is  summarized  in  Fig.  3.  Note  that  evaluation  of  the  analytic 
BER  model  is  required  in  order  to  evaluate  each  tentative  update. 

The  BER  model  is  invariant  to  scaling  the  CSE,  hence  the  unit 
norm  constraint  is  used  for  convenience.  If  the  step  size  is  agtep  = 

,  then  the  average  update  size  is  {£  {wftepWstep}) = 
a.  Eor  our  simulations,  we  use  a  =  0.01,  i.e.  each  step  has  a  mag¬ 
nitude  of  about  1%  of  the  magnitude  of  the  current  CSE.  The  ini¬ 
tialization  for  wtesf  should  be  a  cheap-to-compute  CSE  that  has  the 
best  performance  of  all  such  designs,  so  that  there  is  a  reasonable 
chance  of  starting  in  the  valley  of  the  global  minimum  of  the  BER. 

There  are  two  drawbacks  to  the  greedy  search.  Eirst,  it  requires 
computation  of  the  BER  model  at  each  iteration,  which  is  very  ex¬ 
pensive.  If  Naorr  symbols  are  used  to  compute  the  correlation  terms, 
then  the  greedy  search  requires  |  {PL-u,)'^N‘^Ncorr  complex  multiply- 
and-accumulate  (MAC)  operations  to  compute  the  correlation  terms, 
and  a  further  {PLw)^N^  complex  MACs  per  iteration.  Second,  as 
with  a  gradient  descent  method,  the  global  minimum  is  only  achieved 
if  the  initialization  lies  somewhere  in  the  valley  of  the  global  mini¬ 
mum.  In  order  to  not  become  trapped  in  a  local  minimum,  the  greedy 
search  can  be  generalized  to  simulated  annealing  (SA)  [14].  SA  oc¬ 
casionally  allows  upwards  steps,  but  the  probability  of  allowing  them 
decreases  according  to  a  user-defined  “cooling  schedule.”  Under  cer¬ 
tain  conditions  (including  infinite  run  time  and  an  infinitesimally 
slow  cooling  schedule),  simulated  annealing  will  find  the  global  min¬ 
imum  of  the  cost  function.  However,  this  further  adds  to  the  com¬ 
plexity,  since  a  large  number  of  iterations  is  required. 

5.  SIMULATIONS 


Together,  (12),  (22),  and  (23)  allow  us  to  evaluate  the  BER  for  a 
given  CSE  w  and  unbiased  MMSE  EEQ  d  based  on  that  CSE. 

We  can  visualize  the  BER  and  SNR  model  by  using  a  3-tap  unit 
norm  CSE  parameterized  by  the  two  angles  of  spherical  coordinates. 
The  channel  is  h  =  [1, —0.3, 0.7]  with  adesired  length  of  !z-|-l  =  2, 
FET  size  N  —  8,  and  20  dB  SNR.  Eor  each  value  of  the  CSE,  the 
unbiased  MMSE  FEQ  is  given  by  (23).  Eig.  2  shows  log-spaced  con¬ 
tours  of  the  BER.  The  BER  is  extremely  multimodal,  and  numerical 
optimization  of  this  cost  surface  is  an  ambitious  goal. 


The  algorithms  to  be  compared  are  the  MMSNR  design  [7],  the 
MMSE  design  [5],  the  MDS  design  [15],  the  Min-IBI  design  [13], 
and  the  greedy  search  (G-MER).  We  also  compare  to  the  BER  when 
no  CSE  is  used.  The  EFT  size  is  Ai  =  16  and  the  CP  length  is  zz  =  4, 
which  are  small  since  the  output  SNR  model  of  (22)  is  so  expensive 
to  compute.  The  signal  constellation  is  4-QAM.  The  channels  are 
Rayleigh  fading  with  10  significant  taps  and  an  approximately  expo¬ 
nential  delay  profile  as  in  [16].  There  are  L  =  1  transmit  antenna 
and  P  —  2  receive  antennas.  The  CSE  has  Ln,  =  16  taps  per  an- 
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Fig.  4.  BER  vs.  SNR  for  various  CSE  design  algorithms. 


Fig.  5.  History  of  the  BER  model  for  one  run  of  the  G-MER  algo¬ 
rithm.  The  SNR  was  15  dB. 


tenna.  The  correlation  parameters  are  estimated  using  100  blocks  of 
training,  and  the  Min-IBI  design  is  used  as  the  initialization.  The 
BER  values  will  be  measured  over  200  independent  trials  (channel, 
input  data,  and  noise),  using  2000  blocks  of  data  each.  The  desired 
delay  will  be  chosen  heuristically  (similar  to  [16])  rather  than  per¬ 
forming  a  global  search. 

Fig.  4  shows  the  measured  BER  versus  SNR  in  dB,  for  this 
SCCP  system;  and  Fig.  5  shows  the  calculated  BER  versus  iteration 
number,  for  the  greedy  search.  Note  that  the  BER  model  of  (12), 
(22),  and  (23)  is  only  used  to  perform  the  greedy  search,  and  the  ac¬ 
tual  BER  assessment  in  Fig.  4  uses  the  actual  measured  BER,  not  the 
model.  Aside  from  the  greedy  search,  the  channel  shorteners  consid¬ 
ered  (MDS,  MSSNR,  MMSE,  and  Min-IBI)  are  the  only  ones  that 
the  authors  are  aware  of  that  do  not  explicitly  take  into  account  the 
multicarrier  signal  structure,  hence  they  are  the  only  ones  that  can  be 
directly  applied  to  the  SCCP  system  for  comparison.  The  design  that 
performs  the  best  by  far  is  the  greedy  search  (G-MER).  However,  it 
is  too  computationally  intensive  for  real-time  implementation;  of  the 
remaining  designs,  the  Min-IBI  design  performs  the  best.  Clearly, 


there  is  a  need  for  a  new  design  that  is  computationally  cheaper  than 

the  greedy  search  but  performs  better  than  the  Min-IBI  design. 
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